The document is organized as follows: The section Data data gives an overview of the data used for the investigations of viral load at the first positive test, viral load between B.1.1.7 and non-B.1.1.7 cases, viral load time course, and viral load - culture positivity. The section Analysis describes the analysis approach, implements the analysis, verifies that model parameters were estimated reliably, and that the employed statistical models capture important characteristics of the observed data. Here, the description of the analysis approach includes that descriptions of priors for the Bayesian analysis as well as prior predictive checks to examine that appropriate priors were chosen. The section Results calculates the posterior predictions, from which all reported results are derived, and generates the results presented in the main paper as well as supplementary analyses. The sections Article abstract and results and Figures generate the numbers and figures for the results section of the main article.

1 Data

The analysis uses following data files:

1.1 RT-PCR test data

In this data analysis, subjects are classified into one of the following groups:

  • Likely hospitalized: This includes subjects with a positive test, where the sample was taken in a stationary hospital ward. See table 1.1 groups of testing centres or locations
  • Likely pre-or asymptomatic (PAMS): This includes subjects where the first positive test was taken in a testing centre established for SARS-CoV-2 testing and who later did not tested positive in a hospital ward.
  • Other: All subjects that do not fall into one of the first two groups.

Table 1.1 gives an overview of the number of test, subjects and positive tests in the different test centre types (testing locations).

Table 1.1: Test centre categories and test counts for 393,129 subjects.*
Status Centr abbr. Centre type N centres N positive N tests N individuals Mean VL % hosp. Positivity rate
Likely PAMS C19 COVID-19 testing centre 23 5682 147202 69233 6.9 1 3.9
Likely hospitalized WD Ward 543 3871 193194 96293 6.0 100 2.0
Likely hospitalized H Hospital 158 1895 65609 36915 6.3 100 2.9
Likely hospitalized ICU Intensive care unit 15 563 7762 4790 6.2 100 7.3
Likely hospitalized IDW Infectious diseases ward 1 60 174 144 5.9 100 34.5
Other ED Emergency department 141 7418 173536 121501 6.5 29 4.3
Other OD Outpatient department 214 667 57525 42044 5.6 7 1.2
Other AIR Airport 2 487 41907 36799 6.0 1 1.2
Other ? Unclassified 905 581 32323 20657 5.9 7 1.8
Other RES Age residence 5 1056 22868 8873 6.0 5 4.6
Other PRI Prison 16 193 12932 5180 5.5 0 1.5
Other CP Company physician 7 202 12406 6747 5.8 2 1.6
Other LW Labour ward 31 89 11157 8973 5.6 17 0.8
Other L Other laboratory (not Labor Berlin) 35 451 9852 6662 6.0 0 4.6
Other SM Sports Medicine 1 114 5477 1230 5.2 0 2.1
Other PHD Public health department 5 70 813 657 5.5 0 8.6
Other FM Forensic medicine 1 25 101 71 7.2 0 24.8
* First-positive RT-PCR tests are broken down according to testing centre type. Each centre type category includes data from many different test centres. A category of Other is assigned to all those who were not at a centre testing people who were likely PAMS or likely hospitalized. For example, emergency departments fall into the Other category because subjects presenting there are not likely to be asymptomatic, but are not counted as formally hospitalized at that point.

The following table and figures provide an overview over the data used in the analysis of first positive RT-PCR tests.

Table 1.2: Viral load by age groups for the full sample, pre-, asymptomatic, and mild- (PAMS) cases, and hospitalized cases
Full sample
PAMS
Hospitalized
Age category N pos. pos. rate log10load N time N pos. pos. rate log10load N pos. pos. rate log10load
0-5 271 1.6 0.1 (0.75) 14 22 3.5 0.2 (1.19) 25 0.8 0 (0.53)
5-10 151 1.6 0.1 (0.77) 6 33 5.6 0.3 (1.41) 16 1.3 0.1 (0.7)
10-15 211 2.2 0.1 (0.92) 8 50 7.0 0.4 (1.69) 19 1.4 0.1 (0.74)
15-20 591 3.0 0.2 (1.1) 35 174 5.1 0.3 (1.51) 111 2.7 0.2 (1.03)
20-25 1517 3.2 0.2 (1.19) 101 646 4.1 0.3 (1.4) 229 2.8 0.2 (1.03)
25-35 4133 3.0 0.2 (1.15) 299 1856 4.0 0.3 (1.41) 570 2.3 0.1 (0.94)
35-45 3109 2.7 0.2 (1.09) 292 1158 3.6 0.2 (1.32) 533 2.1 0.1 (0.89)
45-55 3075 3.1 0.2 (1.14) 362 936 3.5 0.2 (1.31) 682 2.4 0.1 (0.94)
55-65 3052 2.8 0.2 (1.08) 572 629 3.2 0.2 (1.25) 959 2.2 0.1 (0.91)
>65 7314 3.1 0.2 (1.16) 2330 132 5.7 0.4 (1.62) 3245 2.5 0.2 (1.03)
* N pos. = number of positive tests, pos. rate = positivity rate, N time = number of cases with time course data.
Number of ositive tests over time.

Figure 1.1: Number of ositive tests over time.

The cobas and LC480 PCR systems appear to use result in different distributions of viral loads. This is especially suggested by the comparison of viral load values that come from the same test centre and were tested on the two systems, and which is shown in 1.3.

Distribution of viral load by PCR system and sub-samples

Figure 1.2: Distribution of viral load by PCR system and sub-samples

Distribution of viral load values from the test centre with most positive test results, split by PCR system.

Figure 1.3: Distribution of viral load values from the test centre with most positive test results, split by PCR system.

Distribution of subject ages by group.

Figure 1.4: Distribution of subject ages by group.

Distribution of viral loas by subject ages and group.

Figure 1.5: Distribution of viral loas by subject ages and group.

1.2 B.1.1.7 data

Here is a short overview of data with the B.1.1.7 variant of the virus, together with different comparison groups. The reasoning behind the ever-more restrictive comparison groups is to compare B.1.1.7 subjects to non-B.1.1.7 subjects that were tested in increasing temporal- and spatial proximity.

Distribution of viral load for B.1.1.7 and non-B.1.1.7  cases for different data sub-sets.

Figure 1.6: Distribution of viral load for B.1.1.7 and non-B.1.1.7 cases for different data sub-sets.

Table 1.3: Sample description for focussed B.1.1.7 analysis.
B117 N Mean age Mean viral load % hospitalized % PAMS
B.1.1.7 362 51 (49, 54) 7.4 (7.2, 7.5) 35 (27, 42) 25.1 (19.2, 31.1)
non-B.1.1.7 492 54 (52, 56) 6.6 (6.4, 6.7) 28 (23, 34) 31.9 (25.9, 38.0)
Numbers are means and 95% confidence intervals.

A crude comparison of log10 viral load of B.1.1.7 and non-B.1.1.7 subjects suggests higher viral load in the former group in adults (see Figure 1.7). There is little data from children.

Viral load by age group, PAMS status and B117. Lines indicate 95% credible intervals. CIs cannot be shown when a group has only one member. This comparison uses the most restrictive data set, that includes B.1.1.7 cases only when there was a test result for a non-B.1.1.7 centre from the same location within +/- one day.

Figure 1.7: Viral load by age group, PAMS status and B117. Lines indicate 95% credible intervals. CIs cannot be shown when a group has only one member. This comparison uses the most restrictive data set, that includes B.1.1.7 cases only when there was a test result for a non-B.1.1.7 centre from the same location within +/- one day.

1.3 RT-PCR time course data

For a subset of participants, multiple RT-PCR test results were available. These data can be used to estimate the time course of viral load. For this analysis we used data from participants with 3 or more RT-PCR results.

To enable a robust estimation viral load time courses, we only included cases who fulfilled following criteria:

  • at least 3 test results.
  • test results must cover a period of at least 5 days
  • if the first and the last test are negative, the results need to cover a period of at least 10 days.
  • at most an increase of five log10 load per day from the first to the second test
Histogram of data available for time course analysis

Figure 1.8: Histogram of data available for time course analysis

The data set has few patients who are younger than 20 years.

Table 1.4: Sample for time course analysis.
Number of tests N Mean age % leading neg. test % trailing neg. test % hospitalized % PAMS
3 1836 61 (60, 62) 21 (18, 23) 32 (29, 35) 75 (67, 83) 10.1 (8.5, 11.6)
4 1005 67 (66, 68) 29 (25, 32) 45 (39, 51) 83 (69, 97) 4.7 (3.3, 6.0)
5-6 825 70 (69, 71) 39 (34, 45) 47 (41, 54) 82 (68, 97) 1.9 (1.0, 2.9)
7-9 293 72 (70, 74) 47 (36, 58) 59 (45, 72) 88 (57, 100) 0.3 (0.0, 1.0)
>9 60 75 (72, 79) 42 (20, 63) 63 (30, 97) 97 (0, 100) 0.0 (0.0, 0.0)
Numbers are means and 95% confidence intervals

Few participants with time course data never had a positive test outside a hospital ward, i.e. were classified as PAMS. These participants are almost exclusively among subjects with three or four data points.

Observed viral load time courses. Red dots are PCR results. Data points are jittered to reduce overlap of points.

Figure 1.9: Observed viral load time courses. Red dots are PCR results. Data points are jittered to reduce overlap of points.

1.4 Culture data

Here we briefly describe data from three published articles Kampen et al. (2020) that assessed infectivity of samples with different viral loads by testing if the virus in a sample could replicate, i.e. result in a positive culture.

Distribution of viral loads for data considered for the estimation of the association between viral load and culture positivity.

Figure 1.10: Distribution of viral loads for data considered for the estimation of the association between viral load and culture positivity.

We estimated a Bayesian hierarchical logistic regressions in rstanarm (Goodrich et al. 2020) for the three datasets to investigate if the association between viral load and positive culture outcome is sufficiently similar to pool the data from these studies:

Estimated association between viral load and culture positivity by data set.

Figure 1.11: Estimated association between viral load and culture positivity by data set.

The associations of the data from Wölfel and Perera are similar enough to warrant pooling the culture data. The data from van Kampen will not be included, as it shows a different association between viral load and culture positivity.

2 Analysis

2.1 Viral load and infectiousness at the first positive test across age

We use a Bayesian thin-plate spline regression (Wood 2003) as implemented in the brms package (Bürkner 2017, 2018) to estimate the association between age and viral load. Following specifications account for structure in the data:

  • We model differences in viral load between sexes, samples measured on different PCR systems, samples from the B.1.1.7 variant, and samples from different test centre categories (see Table 1.1)
  • We allow the association between age and viral load to vary, dependent on whether subjects fall into the PAMS, Hospitalized, or Other group
  • We allow for variation of residual variance, dependent on age group and test centre category
  • We implement a robust regression approach by using a student-t distributed error term.

As described earlier, cases are categorized in the groups (see also Table 1.1):

  • PAMS
  • Hospitalized
  • Other

The basic brms model is

 bf(log10Load ~ Group + B117 + PCR + Male + s(Age, by = Group) + (1 | TestCentreCategory/Hospitalized) ,
    family = student(),
    sigma ~ PCR + (1 | fAgeGroup) + (1 | TestCentreCategory))

wherein

  • ~ Group models the population level effect of being part of the “PAMS,” “Hospitalized,” or “Other” group
  • B117 + PCR + Male models effects of PCR system and gender
  • + (1 | TestCentreCategory/Hospitalized) models differences between testing centre categories and hospitalized and non-hospitalized individuals within testing centre categories with random effects.
  • + s(Age, by = Group) implements a thin-plate spline model for the effect of Age, with separate spline coefficients for groups defined by case group.
  • family = student() uses student-t distributed error variance which implements a robust regression that reduces the influence of outliers,
  • sigma ~ PCR + (1 | fAgeGroup) + (1 | TestCentreCategory) models age-group specific error variances.

In addition, we modified the basic brms model to limit the intercept parameters to values above 0, and we added a simple imputation procedure to deal with missing sex/gender information. In particular, we use a variable “Male” that is set to one for male subjects and zero for female subjects. For subjects with missing gender information a value between 0 and 1 is imputed, without an explicit imputation model.

2.1.1 Check successful estimation

Next we check R-hat (Vehtari et al. 2020) and number of divergent iterations (Betancourt 2016) to verify a successful model estimation.

R-hat values for viral load and age models

Figure 2.1: R-hat values for viral load and age models

The maximal R-hat value is 1.015 and there are 1 divergent iterations out of a total of 4000 iterations, which indicates that the model converged.

2.1.2 Posterior predcitive check

To check if the current analysis describes the observed data well, we use a visual posterior predictive check. We examine whether the observed means and standard deviations of group-wise viral loads are within the 90% credible interval of the posterior predictive distribution. Figure 2.2 shows observed and estimated viral load means and standard deviations by age group.

Observed (points and vertical lines) and estimated (horizontal line and confidence band) age-wise mean viral loads with confidence and credible intervals, respectively.

Figure 2.2: Observed (points and vertical lines) and estimated (horizontal line and confidence band) age-wise mean viral loads with confidence and credible intervals, respectively.

Observed and estimated viral loads for subjects younger than 26 years stratified by subject group.

Figure 2.3: Observed and estimated viral loads for subjects younger than 26 years stratified by subject group.

The next plots compare directly-estimated and observed mean and standard deviations for age groups (see Figure 2.4) subject group (see Figure 2.5) and test center category (see Figure 2.6).

Observed (red dots) and estimated means and variance of viral load by age group.

Figure 2.4: Observed (red dots) and estimated means and variance of viral load by age group.

Observed (red dots) and estimated means and variance of viral load by sub-sample

Figure 2.5: Observed (red dots) and estimated means and variance of viral load by sub-sample

Observed (red dots) and estimated means and variance of viral load by test centre group

Figure 2.6: Observed (red dots) and estimated means and variance of viral load by test centre group

2.2 Association of viral load probability of a positive culture

To estimate the association between viral load and probability of positive culture we estimate a logistic regression in brms.

2.3 B.1.1.7 viral load at the first positive test

The B.1.1.7 cases in this sample are from a restricted time period (starting January 2021) and from small number of test locations. Because the appearance of B.1.1.7 cases can have led to changing testing strategies, this analysis restricts the comparison of B.1.1.7 and non-B.1.1.7 cases to test results that were obtained either one day prior, the same day, or one day after the positive B.1.1.7 cases. For instance, if test centre A had only one positive test on January 15th, and centre B had only one positive test on January 20th, the analysis would only use positive test results from January 14th-16th from test centre A and test results January 19th-21st from test centre B. Further, this analysis adjusts for age, PCR system, gender, and models effects of test centres as random effects.

To examine the sensitivity of results to different analysis strategies, we estimate the B.1.1.7 with different data (c.f. 1.4)

  • the full data set
  • B.1.1.7 cases and non-B.1.1.7 cases within +/- 5 days of the B.1.1.7 cases
  • B.1.1.7 cases and non-B.1.1.7 cases within +/- 1 day of the B.1.1.7 cases
  • non-B.1.1.7 cases within +/- 1 day of the B.1.1.7 cases, and B.1.1.7 cases only from testing centres/locations that also reported non-B.1.1.7 cases within +/- 1 day of the B.1.1.7 cases

and with different regression model

  • unadjusted regression
  • uanadjusted regression with random effects for testing centres/locations
  • adjustment for gender, age, PCR system, and subject group with random effects for testing centres/locations

2.4 Viral load and infectiousness time course

2.4.1 A model for estimating viral load time course without knowledge of day of infection or peak viral load

In this analysis, we use the time course data described above to estimate how viral load and infectiousness, indexed by culture positivity, evolve over time. Because the day of infection is not known, we estimate it simultaneously with the viral load time course. The model has following basic properties:

  • log10 viral load is assumed to increase linearly up to the maximum viral load and thereafter decreases linearly.
  • day zero is the day of peak viral load, which is modeled with a parameter intercept
  • the linear increase of log10 viral load from the day of infection to peak viral load is a modeled with a slope parameter slope_up
  • the linear decrease has a slope parameter slope_down
  • we use a logistic weighting function (weight_down = inv_logit(day*b)) to calculate log10 viral load at any point in time as the average of the up and down slopes: yhat = load_up * (1-weight_down) + load_down * weight_down. This implements a smooth transition from increasing to declining viral load.

We use random effects models for intercept, slope_up, and slope_down to estimate individual level trajectories. In short, individual level parameters \(p^g_i\) are estimated as

\[log(p_i) = \alpha + \beta X_i + \gamma s(age) + \rho_gZ_i\]

where \(\alpha\) is the sample or grand mean (see below for priors) the vector \(\beta\) has weights for the fixed effects of gender, PAMS, B.1.1.7, and hospitalization status in vector \(X_i\). \(\gamma s(age)\) represent non-linear effects of age modeled with restricted cubic splines. \(Z_i\) is a vector with random effects variables and \(\rho_h\) are the associated weights for group \(g\) and \(\rho_g\sim N(0,\sigma)\), with \(\sigma\) being the random effects deviation. To obtained the negative slopes for the viral load decay \(log(p_i)\) is multiplied with -1. We estimate for all model parameters subject level random effects. In addition, we estimate for the intercept the effect of the centre in which the first test was taken and for the down slope the effect of the centre (typically a hospital unit or ward) which the patient stayed for the longest time. These two latter effects are also estimated as random effects.

To align observed viral loads with with estimated viral load time courses, we estimate for each subject a “shift” parameter, which captures the number of days between peak viral load and the first test. We model the basic shift with a uniform prior from -10 to 20 days and in addition model differences in shifts between participants that had thier first test in different centres as random effects.

We use following priors for sample level means (\(alpha\) above):

  • \(log(intercept)\) (peak viral load): \(N(2.1,0.15)\). This prior distribution has a mean 8.3, a 10% quantile of 6.7, and a 90% quantile of 9.9.
  • \(log(slope_{up})\) (attack rate): \(N(0.6,0.25)\). This prior distribution has a mean 1.9, a 10% quantile of 1.3, and a 90% quantile of 2.5,
  • \(-log(slope_down)\) (decay rate): \(N(-1.75,0.5)\). This prior distribution has a mean 0.2, a 10% quantile of 0.09, and a 90% quantile of 0.33.

Priors for random effects variances ($above) are \(gamma(4,20)\) for log-10 load model parameters, \(gamma(5,4)\) for shifts. Random effects for viral load parameters are added on the log scale. Priors for fixed effects, which are added on the log scale (\(\beta\) above) are \(N(0,.125)\).1

The statistical model was implemented in Stan (Carpenter et al. 2017). The complete Stan model can be found at then end of this document.

2.4.2 Data for the Stan model

2.4.3 Model priors and prior predictions

A first step of the modelling workflow is to examine the prior predictive distributions of the model. Here, we check if the model predictions for suject-level parameters are roughly in line with the prior information about the time course of a SARS Cov-19 infection, namely a duration between 3 and 7 days from infection to peak viral load, a peak viral load of around 8, and decay rate of around 0.15. To obtain prior predictive distributions, we sample from the model without conditioning on the data.

Figure 2.7 illustrates the model and describes key model parameters.

Components of the two-slopes model used to estimate viral load time courses. Top: Priors for grand mean of model parameters. The prior for time to peak viral load is the ratio of the priors for slope up and intercept.  Second row: Components of exemplary viral load trajectories. Each line is generated from the prior distribution of group level means in the top row. Third row: Weighting function for the combination of up- and down-slopes. Bottom: Exemplary viral load trajectories. In the full model, individual level parameters also depend on random effects, individuals' age, gender, PAMS status, and PCR system effects.

Figure 2.7: Components of the two-slopes model used to estimate viral load time courses. Top: Priors for grand mean of model parameters. The prior for time to peak viral load is the ratio of the priors for slope up and intercept. Second row: Components of exemplary viral load trajectories. Each line is generated from the prior distribution of group level means in the top row. Third row: Weighting function for the combination of up- and down-slopes. Bottom: Exemplary viral load trajectories. In the full model, individual level parameters also depend on random effects, individuals’ age, gender, PAMS status, and PCR system effects.

Figure 2.8 shows that the subject-level parameters, which in addition to sample levels means shown in Figure 2.7 also depend on additional fixed and random effects, have a wider distribution than the priors for group level means.

Prior distributions for group level means and prior predictive distributions of individual level parameters. Dotted lines show the prior distributions of group level means, which are also shown in the previous plot. Solid lines show prior predictive distributions of subject-level parameters, which are obtained by adding random effects and effects of covariates (age, gender, PAMS), to the group-level means.

Figure 2.8: Prior distributions for group level means and prior predictive distributions of individual level parameters. Dotted lines show the prior distributions of group level means, which are also shown in the previous plot. Solid lines show prior predictive distributions of subject-level parameters, which are obtained by adding random effects and effects of covariates (age, gender, PAMS), to the group-level means.

2.4.4 Model estimation

To estimate model parameters, we used 4 independent chains, each with 1000 warm-up and 1000 post-warm up samples. Prior analysis showed that 4000 warm-up samples produced sufficient effective samples to reliably estimate 90% credible intervals for parameters of interest. Here, we only show the code for the analysis of the full date set (which takes around 6 hours with 4 parallel chains on an Amazon EC2 c5d.2xlarge instance).

2.4.5 Check successful estimation

Next, we check R-hat and number of divergent iterations to verify successful model estimation. R-hat values should be below 1.1, or even better below 1.01.

R-hat values and number of effective samples sizes (ESS) for model parameters. Top row: population level parameters. Bottom row: Subject level parameters.

Figure 2.9: R-hat values and number of effective samples sizes (ESS) for model parameters. Top row: population level parameters. Bottom row: Subject level parameters.

The R-hat values indicate that all population-level parameters converged. However, a small number of subject-level parameters have R-hat values above 1.1. Inspection of the 118 relevant subjects shows that this is due to viral load times courses that could be at the onset or later in the infection, resulting in bi-modal distributions of these subjects’ shift-parameters (see 2.11).

The estimation resulted in 0.25% divergent iterations. Optimally, there should be no divergent iterations, though this is still a matter of current research and a small number of divergent iteration is not generally seen as problematic.

2.4.6 Posterior predcitive check

To check that the model adequately describes the data, we plot observed data together with model predictions as a posterior predictive check (Gabry et al. 2019). Figure 2.10 shows individual-level expected linear upwards and downwards slope of viral load given data, model, and estimated parameters. Note that the observed data points do not need to be and are not typically enveloped by the 90% credible interval of the expected viral load.

Posterior predictive check for the estimated 2-slopes time course. Blue lines are expected viral load time courses, i.e. the average over all time courses calulated from the posterior distribution of a subjects' parameters. The shaded blue region indicate the 90% credible interval of the expection. x are observed measurements. Red points are observed measurements after shifting them for alignment in time and imputation of viral loads for negative tests.

Figure 2.10: Posterior predictive check for the estimated 2-slopes time course. Blue lines are expected viral load time courses, i.e. the average over all time courses calulated from the posterior distribution of a subjects’ parameters. The shaded blue region indicate the 90% credible interval of the expection. x are observed measurements. Red points are observed measurements after shifting them for alignment in time and imputation of viral loads for negative tests.

Posterior predictive check for subjects with at least one subject-level parameter with an R-hat value larger than 1.1. Parameters for which R-hat values are larger than 1.1 are shown in the top right corner. imp_neg is the imputed viral load for negative tests, intercept_raw is the subject level random effect for peak load, intercept is the subject's peak load (the sum of population average, population effects of age, PAMS, etc, and subject-level random effect.)

Figure 2.11: Posterior predictive check for subjects with at least one subject-level parameter with an R-hat value larger than 1.1. Parameters for which R-hat values are larger than 1.1 are shown in the top right corner. imp_neg is the imputed viral load for negative tests, intercept_raw is the subject level random effect for peak load, intercept is the subject’s peak load (the sum of population average, population effects of age, PAMS, etc, and subject-level random effect.)

Figure 2.12 shows the data together with posterior predictions, i.e., posterior expectations plus error variance and their 90% credible intervals.

Posterior predictive plot with 90% prediction intervals.

Figure 2.12: Posterior predictive plot with 90% prediction intervals.

2.4.7 Priors and posteriors for key model parameters

To verify that the priors for key parameters, i.e. grand mean for intercept, slope_up, slope_down, and time_to_peak load do not determine the results, figure 2.13 shows the posterior distribution of these parameters together with the prior distributions.

Priors and posteriors for key model parameters. The prior density is shown as a black outline. The posterior density is filled blue.

Figure 2.13: Priors and posteriors for key model parameters. The prior density is shown as a black outline. The posterior density is filled blue.

2.4.8 Estimated shifts

Figure 2.14 indicates that relatively few first-positive tests were estimated to be taken at peak viral load.

Posterior distribution of shifts. Each line represents the outline of the histogram over individual from one posterior sample. The prior for shift was a uniform distribution from -10 to 25.

Figure 2.14: Posterior distribution of shifts. Each line represents the outline of the histogram over individual from one posterior sample. The prior for shift was a uniform distribution from -10 to 25.

Participants with a negative first test were typically assumed to have had the first test prior to peak load, whereas participants whose first test was positive received shifts such that the lower the maximal observed load for the participant was, the later in the infection time course the first test was assumed to be taken.

Association between estimated shift and maximum oberved load for participants. Color indicates if a participant had a leading negative test, and if viral load increased or decreaed over the first two tests.

Figure 2.15: Association between estimated shift and maximum oberved load for participants. Color indicates if a participant had a leading negative test, and if viral load increased or decreaed over the first two tests.

Finally, we can show the shifted data points for all participants:

The plot shows the placement in time of 16,829 RT-PCR viral load values from 4019 subjects with at least three RT-PCR results. Points with central black dots indicate the first test of a subject. Because RT-PCR tests have a limit of detection of around 2 log10 copies (when dilution is accounted for) and false negatives are more likely when the true viral load is low, we imputed viral load values of negative tests (shown in red, with observed positive-test viral loads in blue). The permissible range of imputed values is -Inf to +3. Note that choosing a lower upper limit for trailing negative tests would lead to slightly steeper decrease in viral load. Negative imputed values are allowed to capture situations in which a leading negative test is followed several days later by low, increasing viral loads. In this scenario the infection happened between the leading negative test and the first positive test. The inset in the bottom right corner shows that fitting a line through the first positive tests means that the estimated log10 viral load at the time of the leading negative test has to be negative. The negative values for imputed log10 viral loads should not be interpreted as suggesting the presence of a fractional virus particle. Instead, by allowing imputation of negative log10 viral loads, we calculate a more accurate estimation of increasing viral loads at the beginning of the infection, based on these negative tests that may miss small viral concentrations that are below the limit of detection.

Figure 2.16: The plot shows the placement in time of 16,829 RT-PCR viral load values from 4019 subjects with at least three RT-PCR results. Points with central black dots indicate the first test of a subject. Because RT-PCR tests have a limit of detection of around 2 log10 copies (when dilution is accounted for) and false negatives are more likely when the true viral load is low, we imputed viral load values of negative tests (shown in red, with observed positive-test viral loads in blue). The permissible range of imputed values is -Inf to +3. Note that choosing a lower upper limit for trailing negative tests would lead to slightly steeper decrease in viral load. Negative imputed values are allowed to capture situations in which a leading negative test is followed several days later by low, increasing viral loads. In this scenario the infection happened between the leading negative test and the first positive test. The inset in the bottom right corner shows that fitting a line through the first positive tests means that the estimated log10 viral load at the time of the leading negative test has to be negative. The negative values for imputed log10 viral loads should not be interpreted as suggesting the presence of a fractional virus particle. Instead, by allowing imputation of negative log10 viral loads, we calculate a more accurate estimation of increasing viral loads at the beginning of the infection, based on these negative tests that may miss small viral concentrations that are below the limit of detection.

2.4.9 Key model parameters

Next we show the estimate group-level parameters and their correlations:

Posterior distribution and covariations of population means for key model parameters.

Figure 2.17: Posterior distribution and covariations of population means for key model parameters.

3 Results

3.1 Viral load across age: Null hypothesis significance testing

Here we show the results of the Hypothesis significance tests, split by sub-sample (PAMS vs. unclear/all)

Sample Comparison Difference pWelch t-stat t df pMW
all 0-5 vs 20-65 -0.7 (-0.92, -0.49) 0.00 -6.44 280.52 0.00
all 5-10 vs 20-65 -0.55 (-0.82, -0.29) 0.00 -4.08 153.73 0.00
all 10-15 vs 20-65 -0.41 (-0.65, -0.17) 0.00 -3.32 216.34 0.00
all 15-20 vs 20-65 -0.2 (-0.35, -0.05) 0.01 -2.56 636.06 0.01
PAMS 0-5 vs 20-65 -0.76 (-1.68, 0.16) 0.10 -1.71 21.13 0.07
PAMS 5-10 vs 20-65 -1 (-1.62, -0.38) 0.00 -3.30 32.43 0.00
PAMS 10-15 vs 20-65 -0.56 (-1.1, -0.01) 0.04 -2.06 49.84 0.03
PAMS 15-20 vs 20-65 -0.27 (-0.54, 0.01) 0.06 -1.93 184.94 0.06
Hospitalized 0-5 vs 20-65 -0.42 (-1.22, 0.39) 0.30 -1.05 28.30 0.14
Hospitalized 5-10 vs 20-65 -0.43 (-1.4, 0.54) 0.36 -0.94 17.13 0.23
Hospitalized 10-15 vs 20-65 -0.22 (-1.2, 0.76) 0.64 -0.47 19.14 0.41
Hospitalized 15-20 vs 20-65 -0.04 (-0.38, 0.3) 0.83 -0.22 131.27 0.70
The difference is given with 95% confidende intervals, p = p-value,
t = t-test, df = degrees of freedom, MW = Mann-Whitney U test

3.2 Viral load across age: Bayesian analysis

To compute estimates of age- and subject-group-wise viral loads and viral load differences, we generate a new articifial data set , with which we can obtain posterior predictions for hypothetical individuals stratified by, age and testing center, and subject group. Note that we set the value for PCR system to the more frequently used cobas system (81%) and set the value for Male to 0.5. By marginalizing over co-variates, in particular testing centre, and comparing groups we later calculate statistics of interest.

Next we generate weights for the calculation of result-statistics that accurately reflect the sample. Test centres with more test-samples should have a greater weight when calculating viral load or culture positivity average by age, and when calculating averages for age groups, age years with many participants should receive a greater weight than age years with fewer patients.

3.2.0.1 Calculating linear posterior predictions

Here we calculate linear posterior predictions of viral load \(\hat{Y}_{load|Age}\) for ages 0:100, separately for the three subject groups PAMS, Hospitalized, and Other. We calculate all posterior predictions on the level of posterior draws. In particular

\(\hat{Y}_{load|age,group} = \alpha_{load} + \beta_{load} X + \rho_C Z_C + \rho_{A,1} Z_A + \rho_{A,2} Z_A + \rho_{A,3} Z_A\)

where \(\alpha\) is the intercept, \(\beta\) are regression weights for co-variates (including subject group), \(Z\) and \(\rho\) are design matrices for random effects, \(._{C}\) and \(._{A,k}\) are indicators for test centre and spline basis functions for age. The \(k\) in \(._{A,k}\) indicates subject group specific spline coefficients.

We calculate culture positivity for the complete sample based on the association of viral load and percent positive culture estimated with the data from Woelfel at al and Perera et el. The posterior predictions of culture positivity are calculated as

\(\hat{Y}_{CP} = logit(\alpha_{CP} + \beta_{CP} load)\)

Estimated association between viral load and culture positivity based on data from Woelfel et al (2020) and Perera et al (2020).

Figure 3.1: Estimated association between viral load and culture positivity based on data from Woelfel et al (2020) and Perera et al (2020).

Therefore, we calculate the expected culture positivity given age and group as

\(\hat{Y}_{CP|\hat{Y}_{load|age,group}} = logit(\alpha_{CP} + \beta_{CP} \hat{Y}_{load|age,group})\).

The parameters \(\alpha_{CP}\) and \(\beta_{CP}\) are obtained independently from the model and parameters used to calculate \(\hat{Y}_{load|age,group}\). To properly combine the uncertainty of parameter estimates from both model, \(\hat{Y}_{CP|\hat{Y}_{load|age,group}}\) is calculate on the level of posterior distribution draws from the two analyses. That is, 4000 posterior predictions from the age model are paired randomly with 4000 posterior draws from the culture positiity model to calculate 4000 draws that make up the posterior distribution of the estimaged age and group wise culture positivity \(\hat{Y}_{CP|\hat{Y}_{load|age,group}}\).

In addition to calculating the expected culture positivity given the estimated viral load by age, we can also calculate the expected culture positivity given the observed viral loads as \(\hat{Y}_{CP|Y_{load|age,group}} = logit(\alpha_{CP} + \beta_{CP} Y_{load|age,group})\).

Table 3.1: Culture posisivity for observed viral loads by age group and PAMS status.
AgeGroup Other PAMS Hospitalized
0-5 0.1 (0.04, 0.19) 0.18 (0.09, 0.28) 0.1 (0.04, 0.18)
10-15 0.14 (0.06, 0.23) 0.22 (0.12, 0.33) 0.13 (0.05, 0.22)
15-20 0.16 (0.07, 0.26) 0.3 (0.19, 0.42) 0.16 (0.07, 0.26)
20-25 0.2 (0.1, 0.31) 0.37 (0.25, 0.5) 0.15 (0.07, 0.24)
20-65 0.19 (0.1, 0.3) 0.39 (0.27, 0.51) 0.16 (0.08, 0.26)
25-35 0.2 (0.1, 0.31) 0.41 (0.29, 0.54) 0.16 (0.08, 0.26)
35-45 0.18 (0.09, 0.29) 0.39 (0.27, 0.51) 0.16 (0.08, 0.26)
45-55 0.18 (0.09, 0.29) 0.37 (0.25, 0.5) 0.16 (0.08, 0.26)
5-10 0.13 (0.06, 0.23) 0.13 (0.06, 0.23) 0.1 (0.04, 0.18)
55-65 0.19 (0.1, 0.3) 0.38 (0.26, 0.5) 0.17 (0.08, 0.27)
>65 0.21 (0.11, 0.32) 0.34 (0.22, 0.46) 0.25 (0.14, 0.36)
Estimates are based on the average observed viral load in age groups, whereas figures and comparisons reported later are based on estimated viral loads.
Posterior distributions of percent positive culture for sub-samples.

Figure 3.2: Posterior distributions of percent positive culture for sub-samples.

3.2.1 Viral load by age

We start the preparation of results by calculating mean and credible intervals for age- and group-wise viral load. From this we can plot for age-wise viral load, stratified by subject group.

Estimated viral load by age. Confidence bands indicate 90% credible intervals

Figure 3.3: Estimated viral load by age. Confidence bands indicate 90% credible intervals

Using the just-calculated posterior predictions by subject group and age, we next compare PAMS and hospitalized subjects.

Viral load differences between PAMS and non-PMAS subjects by age

Figure 3.4: Viral load differences between PAMS and non-PMAS subjects by age

3.2.1.1 Age differences

First, we calculate the average estimated viral load for the three age groups 0-20, 20-65, 65 and older without stratification by age group.

Next we look at age differences in viral load by comparing younger subjects group against adults aged 20-65.

Table 3.2: Viral load and culture positivity by age and age difference
outcome sample 0-5 5-10 10-15 15-20 20-65 65-101 0-5 vs 20-65 5-10 vs 20-65 10-15 vs 20-65 15-20 vs 20-65
log10Load all 5.75 (5.6, 5.89) 5.95 (5.84, 6.05) 6.09 (6, 6.17) 6.24 (6.18, 6.3) 6.39 (6.36, 6.42) 6.38 (6.35, 6.42) -0.64 (-0.79, -0.5) -0.45 (-0.56, -0.33) -0.31 (-0.4, -0.22) -0.15 (-0.21, -0.09)
log10Load PAMS 6.16 (5.77, 6.54) 6.31 (6.02, 6.6) 6.49 (6.28, 6.68) 6.67 (6.55, 6.79) 6.88 (6.83, 6.92) 6.69 (6.5, 6.88) -0.72 (-1.1, -0.33) -0.57 (-0.86, -0.27) -0.39 (-0.6, -0.19) -0.2 (-0.32, -0.09)
log10Load Hospitalized 5.83 (5.56, 6.12) 5.86 (5.65, 6.07) 5.88 (5.72, 6.04) 5.9 (5.78, 6.03) 6.03 (5.99, 6.08) 6.43 (6.39, 6.48) -0.2 (-0.48, 0.09) -0.18 (-0.38, 0.03) -0.16 (-0.31, 0.01) -0.13 (-0.25, -0.01)
CP all 0.12 (0.05, 0.21) 0.15 (0.07, 0.25) 0.18 (0.09, 0.28) 0.21 (0.12, 0.32) 0.25 (0.16, 0.36) 0.24 (0.14, 0.35) -0.14 (-0.19, -0.09) -0.1 (-0.14, -0.07) -0.08 (-0.11, -0.05) -0.04 (-0.06, -0.02)
CP PAMS 0.19 (0.08, 0.33) 0.22 (0.11, 0.36) 0.27 (0.15, 0.4) 0.32 (0.2, 0.45) 0.39 (0.26, 0.51) 0.33 (0.2, 0.46) -0.19 (-0.3, -0.09) -0.16 (-0.26, -0.08) -0.12 (-0.2, -0.05) -0.07 (-0.11, -0.02)
CP Hospitalized 0.13 (0.05, 0.24) 0.13 (0.05, 0.23) 0.13 (0.06, 0.23) 0.14 (0.06, 0.24) 0.16 (0.08, 0.26) 0.25 (0.15, 0.37) -0.03 (-0.08, 0.02) -0.03 (-0.07, 0.01) -0.03 (-0.06, 0) -0.02 (-0.05, 0)
Estimated viral load age differences. Confidence bands indicate 90% credible intervals

Figure 3.5: Estimated viral load age differences. Confidence bands indicate 90% credible intervals

A more fine-grained comparison compares each age year against 50 years old adults.

Estimated viral load age differences. Confidence bands indicate 90% credible intervals

Figure 3.6: Estimated viral load age differences. Confidence bands indicate 90% credible intervals

To collect all age comparisons in one place and thus facilitate comparisons, we compose a table with estimated age differences for viral load and culture positivity and unadjusted age differences calculated directly from the raw data. This is table 2 of the main article.

Table 3.3: Differences of probability of positive culture between age groups.
Sample Comparison model-diff load model-diff CP Difference pMW
all 0-5 vs 20-65 -0.64 (-0.79, -0.5) -0.14 (-0.19, -0.09) -0.7 (-0.92, -0.49) <.001
all 5-10 vs 20-65 -0.45 (-0.56, -0.33) -0.1 (-0.14, -0.07) -0.55 (-0.82, -0.29) <.001
all 10-15 vs 20-65 -0.31 (-0.4, -0.22) -0.08 (-0.11, -0.05) -0.41 (-0.65, -0.17) 0.001
all 15-20 vs 20-65 -0.15 (-0.21, -0.09) -0.04 (-0.06, -0.02) -0.2 (-0.35, -0.05) 0.009
PAMS 0-5 vs 20-65 -0.72 (-1.1, -0.33) -0.19 (-0.3, -0.09) -0.76 (-1.68, 0.16) 0.068
PAMS 5-10 vs 20-65 -0.57 (-0.86, -0.27) -0.16 (-0.26, -0.08) -1 (-1.62, -0.38) 0.002
PAMS 10-15 vs 20-65 -0.39 (-0.6, -0.19) -0.12 (-0.2, -0.05) -0.56 (-1.1, -0.01) 0.035
PAMS 15-20 vs 20-65 -0.2 (-0.32, -0.09) -0.07 (-0.11, -0.02) -0.27 (-0.54, 0.01) 0.057
Hospitalized 0-5 vs 20-65 -0.2 (-0.48, 0.09) -0.03 (-0.08, 0.02) -0.42 (-1.22, 0.39) 0.137
Hospitalized 5-10 vs 20-65 -0.18 (-0.38, 0.03) -0.03 (-0.07, 0.01) -0.43 (-1.4, 0.54) 0.231
Hospitalized 10-15 vs 20-65 -0.16 (-0.31, 0.01) -0.03 (-0.06, 0) -0.22 (-1.2, 0.76) 0.409
Hospitalized 15-20 vs 20-65 -0.13 (-0.25, -0.01) -0.02 (-0.05, 0) -0.04 (-0.38, 0.3) 0.7
model-diff = model-based difference, CP = culture positivity, load = log10viral load, difference = raw difference with 95% confidende intervals, p = p-value, MW = Mann-Whitney U test

3.2.2 Culture positivity by age and differences between age

Age-wise culture positivity and derived differences were calculated above. Here we only plot the results of these calculations.

Estimated culture positivity and differences between age group. Confidence bands indicate 90% credible intervals

Figure 3.7: Estimated culture positivity and differences between age group. Confidence bands indicate 90% credible intervals

Culture positivity comparisons withe the age group 44-55.

Figure 3.8: Culture positivity comparisons withe the age group 44-55.

Next we calculate and plot the difference and ratio of culture probability for PAMS and hospitalized subjects, stratified by age in full years.

Viral load differences between PAMS and hospitalized subjects by age

Figure 3.9: Viral load differences between PAMS and hospitalized subjects by age

At the average viral load of 20-65 year old adults, PAMS and hospitalized cases have an expected culture positivity of 0.4 (0.3, 0.5) and 0.2 (0.1, 0.3), respectively, resulting in a differences of 0.2 (0.2, 0.3) or a ratio of 2.6 (1.8, 3.9).

3.3 B117 Viral load and culture positivity

To visualize the effect of B.1.1.7, we show posterior distributions for estimated viral load for B.1.1.7 and non-B.1.1.7 cases as well as for the difference between these two groups.

Posterior distributions of viral load and culture positivity for B.1.1.7 and non B.1.1.7 cases. A posterior distribution of log10 viral load. Darker shadings indicate 10 to 90% credible intervals. B difference of average viral load between B.1.1.7 and non-B.1.1.7 cases. C posterior distribution of the probability of a positive culture load. Darker shadings indicate 10 to 90% credible intervals. D Difference of average culture probability between B.1.1.7 and non-B.1.1.7 cases.

Figure 3.10: Posterior distributions of viral load and culture positivity for B.1.1.7 and non B.1.1.7 cases. A posterior distribution of log10 viral load. Darker shadings indicate 10 to 90% credible intervals. B difference of average viral load between B.1.1.7 and non-B.1.1.7 cases. C posterior distribution of the probability of a positive culture load. Darker shadings indicate 10 to 90% credible intervals. D Difference of average culture probability between B.1.1.7 and non-B.1.1.7 cases.

At 7.3 (7.0, 7.5) B.1.1.7 cases have on average a 1.1 (0.9, 1.3) higher viral load than non-B117 cases. Assuming that the association between culture positivity is the same for the B.1.1.7 and non-B.1.1.7 variants of the virus, this implies an average probability of a positive culture of 0.53 (0.37, 0.69) for B117 cases, compared to 0.19 (0.09, 0.32) for non-B117 cases, that is a 3.0 (1.9, 5.1) times higher culture positivity for B.1.1.7 cases.

To document the robustness of the result to different analysis approaches, we show the results of a number of alternative analyses. It should be kept in mind that the primary analysis with the most restricted data set has the relative cleanest design, i.e., the best matching of B.1.1.7 and non-B.1.1.7 cases.

Table 3.4: Log10 viral load differences between B.1.1.7 and non-B.1.1.7 cases. Each row shows the estimated effect of B.1.1.7 in an alternative analysis.
Window Adjusted Rand. eff Paired N B.1.1.7 N non-B.1.1.7 Effect B.1.1.7
Inf Yes No No 409 23015 0.97 (0.84, 1.10)
Inf Yes No No 409 23015 1.03 (0.90, 1.16)
Inf Yes No No 409 23015 0.97 (0.83, 1.11)
5 Yes No No 409 895 0.84 (0.68, 1.00)
5 Yes No No 409 895 1.04 (0.88, 1.20)
5 Yes No No 409 895 1.09 (0.92, 1.26)
1 Yes No No 409 492 0.76 (0.57, 0.95)
1 Yes No No 409 492 1.01 (0.82, 1.19)
1 Yes No No 409 492 1.09 (0.90, 1.28)
1 Yes Yes Yes 362 492 1.10 (0.91, 1.29)
Window: Number of days within which non-B.1.1.7 cases must occur in a test centre with B.1.1.7 cases to be included in the analysis. 5 = non-B.1.1.7 cases detected within +/-5 days of B.1.1.7 cases are included. Adjusted: Adjustment for age, PCR type, group (PAMS, Hospitalized, Other), and sex. Random effects: Modelling test centres as random effects. Paired: Yes if only test centres that report B.1.1.7 and non-B.1.1.7 centres are included. Effects are given with 90% credible intervals.

3.4 Viral load and culture positivity over time

3.4.1 Basic model parameters: Peak day and slopes

The following plot shows group-level means and standard deviations of patient level parameter estimates.

Posterior distributions of means and standard deviations for subject level model parameters. Numbers in the plot are mean and the 90% credible interval. Top row: Posterior distribution for mean over cases. Bottom row: Posterior distribution for standard deviation over cases

Figure 3.11: Posterior distributions of means and standard deviations for subject level model parameters. Numbers in the plot are mean and the 90% credible interval. Top row: Posterior distribution for mean over cases. Bottom row: Posterior distribution for standard deviation over cases

This model estimated the mean peak viral load to be 8.2 (8.0, 8.3) at 4.2 (4.0, 4.5) days after infections, and gradients of 2.0 (1.9, 2.1) and -0.167 (-0.171, -0.164). The corresponding standard deviations were 0.71 (0.67, 0.76), 0.83 (0.57, 1.13), 0.35 (0.21, 0.51) and 0.016 (0.012, 0.021), respectively.

The following plot shows the distribution of the mean (expected) model parameter over subjects. The larger standard deviation for the growth gradient parameter is due to using PAMS as a predictor for shift and growth gradient, which are two dependent parameters in the model. Related, the PAMS cases are also those with a lower peak viral load and fewer days to peak viral load.

Distribution of expected individual level model parameters. Expectations of individual level parameters are simply averages of posterior samples for a parameter for an individual.

Figure 3.12: Distribution of expected individual level model parameters. Expectations of individual level parameters are simply averages of posterior samples for a parameter for an individual.

One limitation of the available data is that testing did not follow some pre-planned schedule. Instead, the number of tests a person did most likely depended on factors like the duration of the illness. Hence we also investigate how model parameters are associated with the number of tests performed for a person:

Table 3.5: Estimated model parameters for cases with different numbers of tests per person.
# tests Slope of log10 load increase Days to peak load Peak log10 viral load Slope of log10 load decrease
3 2.01 (1.89, 2.15) 4.14 (3.88, 4.41) 8.04 (7.85, 8.24) -0.167 (-0.171, -0.164)
4 2 (1.88, 2.12) 4.23 (3.97, 4.49) 8.17 (7.98, 8.37) -0.167 (-0.171, -0.164)
5 2 (1.89, 2.13) 4.25 (3.99, 4.51) 8.23 (8.03, 8.44) -0.166 (-0.17, -0.163)
6-7 2 (1.88, 2.12) 4.31 (4.05, 4.58) 8.33 (8.13, 8.53) -0.167 (-0.171, -0.163)
>7 1.98 (1.86, 2.11) 4.39 (4.11, 4.68) 8.43 (8.23, 8.64) -0.168 (-0.172, -0.164)
Numbers are means and 90% credible intervals. Parameters were estimated in one joint model.
Means and credible intervals of model parameters split by cases with different numbers of tests.  Note that model parameters were estimated in one joint analyses.

Figure 3.13: Means and credible intervals of model parameters split by cases with different numbers of tests. Note that model parameters were estimated in one joint analyses.

The clear association between number of tests, with the peak viral load (intercept) and the down slope is expected, if one assumes that cases with a more severe illness have a higher peak viral load which is also declines more slowly. This analysis does not show clear differences in the attack rate (slope up) between individuals with different numbers of tests.

As a further test, we also estimated model parameters with different subset of cases, i.e., with at least 3, at least 4, … at least 9 tests. Figure 3.14 shows the estimated parameters for these data subsets.

Table 3.6: Estimated model parameters for sub samples with different minimum number of tests per person.
min # tests Slope ~ of ~ log[10] ~ load ~ increase Days ~ to ~ peak ~ load Peak ~ log[10] ~ viral ~ load Slope ~ of ~ log[10] ~ load ~ decrease
3 2 (1.89, 2.13) 4.21 (3.95, 4.47) 8.15 (7.96, 8.35) -0.17 (-0.17, -0.16)
4 1.94 (1.81, 2.08) 4.5 (4.18, 4.84) 8.3 (8.06, 8.55) -0.18 (-0.18, -0.17)
5 1.89 (1.75, 2.04) 4.58 (4.21, 4.95) 8.27 (7.99, 8.56) -0.18 (-0.18, -0.17)
6 1.79 (1.64, 1.96) 4.99 (4.55, 5.45) 8.49 (8.18, 8.8) -0.19 (-0.19, -0.18)
7 1.82 (1.64, 2.03) 5.13 (4.59, 5.69) 8.6 (8.19, 9) -0.19 (-0.2, -0.19)
8 1.89 (1.65, 2.16) 5.11 (4.45, 5.81) 8.73 (8.3, 9.17) -0.2 (-0.21, -0.19)
9 1.63 (1.4, 1.91) 5.84 (4.94, 6.79) 8.63 (8.01, 9.24) -0.2 (-0.21, -0.19)
Numbers are means and 90% credible intervals. Parameters were estimated separately.
Means and credible intervals of model parameters split by sub-samples with different number of minimum tests per case. Note that model parameters were estimated in independent analyses.

Figure 3.14: Means and credible intervals of model parameters split by sub-samples with different number of minimum tests per case. Note that model parameters were estimated in independent analyses.

3.4.2 Associations with adjustment variables

Posterior distribution of the association between PCR system and viral load.

Figure 3.15: Posterior distribution of the association between PCR system and viral load.

Figure 3.16 shows associations between test centre category and test log10 viral load. For instance, any tests taken at C19 centres resulted on average in 0.28 higher log10 viral load, compared to the grand mean and after adjustment for the other variables in the model. These effects were obtained as random effects.

Posterior distribution of random effects of test centre category on viral load. These random effects capture variations of average vairal load measured in different test centre categories.

Figure 3.16: Posterior distribution of random effects of test centre category on viral load. These random effects capture variations of average vairal load measured in different test centre categories.

Figure 3.17 shows associations between the test centre category of the first test and peak viral load. For instance participants with a first positive test obtained in a residence for the elderly (RES) had on average an estimated peak viral load of 7.9, compared to a value of 7.5 for subjects with a firt positive test from an outpatient department (OD). These effects were obtained as random effects.

Posterior distribution of random effects of test centre type for first test on peak viral load

Figure 3.17: Posterior distribution of random effects of test centre type for first test on peak viral load

Figure 3.18 shows associations between the centre type with longest stay and decay gradient. The test centre with the longest stay is defined as the test centre catgeory with longest period from the first to the last of consecutive tests from that test centre.

Posterior distribution of random effects of test centre type with longest stay on decay gradient. Stay is defined as the number of days between the first and the last test from a test centre. Cases without a unique centre with the longest stay are labeled X.

Figure 3.18: Posterior distribution of random effects of test centre type with longest stay on decay gradient. Stay is defined as the number of days between the first and the last test from a test centre. Cases without a unique centre with the longest stay are labeled X.

Model parameters by PAMS, hospitalisation status and gender. Note that due to limitations of the available data, both presence and absence of group differences in these result remain associated with uncertainty.

Figure 3.19: Model parameters by PAMS, hospitalisation status and gender. Note that due to limitations of the available data, both presence and absence of group differences in these result remain associated with uncertainty.

Table 3.7: Model parameters by PAMS and hospitalisation.
Grouped by group N Slope ~ of ~ log[10] ~ load ~ increase Days ~ to ~ peak ~ load Peak ~ log[10] ~ viral ~ load Slope ~ of ~ log[10] ~ load ~ decrease
Gender Female 1920 2.02 (1.88, 2.17) 4.17 (3.87, 4.48) 8.14 (7.94, 8.34) -0.171 (-0.176, -0.166)
Gender Male 2093 1.99 (1.84, 2.14) 4.24 (3.93, 4.58) 8.17 (7.96, 8.37) -0.164 (-0.168, -0.16)
Gender diff NA 0.03 (-0.14, 0.2) -0.07 (-0.43, 0.29) -0.03 (-0.11, 0.06) -0.007 (-0.012, -0.001)
Hospitalized Hospitalised 3212 1.96 (1.84, 2.09) 4.35 (4.07, 4.65) 8.29 (8.08, 8.49) -0.168 (-0.172, -0.164)
Hospitalized non-hosp. 807 2.17 (1.94, 2.44) 3.63 (3.21, 4.05) 7.6 (7.37, 7.83) -0.164 (-0.172, -0.157)
Hospitalized diff NA 0.21 (-0.04, 0.49) -0.73 (-1.2, -0.25) -0.69 (-0.86, -0.53) 0.004 (-0.004, 0.012)
PAMS PAMS 249 2.27 (1.93, 2.69) 3.34 (2.79, 3.94) 7.3 (6.95, 7.65) -0.171 (-0.184, -0.157)
PAMS non-PAMS 3770 1.99 (1.87, 2.11) 4.26 (4, 4.54) 8.21 (8.01, 8.41) -0.167 (-0.171, -0.164)
PAMS diff NA -0.29 (-0.69, 0.06) 0.92 (0.33, 1.5) 0.91 (0.6, 1.21) 0.004 (-0.011, 0.018)

3.4.3 Viral load over time

Figure 3.20 shows estimated viral loads over time. The red line and shaded area are expected viral load and the 90% credible interval. The blue lines are estimated time courses for individual patients.

Estimated viral load over time. Blue lines are posterior expectations (means) for individuals. The red line and shaded area are sample average and its 90% credible interval.

Figure 3.20: Estimated viral load over time. Blue lines are posterior expectations (means) for individuals. The red line and shaded area are sample average and its 90% credible interval.

3.4.4 Number / percent of cases tested befor peak viral load

Figure 3.21 shows that our analysis estimates that it takes on average around 11.6 days from infection to the first positive test, and that this average is is associated with considerable variation between individuals.

Histograms for number of days from infection to first positive test. The figure shows for each draw from the posterior sample a histogram of days from infection to first positive test.

Figure 3.21: Histograms for number of days from infection to first positive test. The figure shows for each draw from the posterior sample a histogram of days from infection to first positive test.

Here we calculate the expected proportion of people who had a first positive test result before peak viral load.

Estimated timing of first positive tests relative to peak viral load. The figure shows posterior distributions.

Figure 3.22: Estimated timing of first positive tests relative to peak viral load. The figure shows posterior distributions.

The estimated average probability to have a positive test before peak viral load is 22.1% (21.0, 23.2) 888 (845, 932) out of the 4019 subjects). Over all subjects, the estimated day for the first positive test was 7.35 (7.16, 7.54) after peak viral load, with a standard deviation of 7.75 (7.64, 7.85). For those who had the first positive test before peak load, we estimated this to be -1.3 (-1.4, -1.3) before peak viral load, whereas it was estimated to be 9.8 (9.6, 10.0) for the others (sd: 7.0 (6.9, 7.1)).

The mean estimated day of detection for PAMS subjects under 70 years is 4.9 (1.3, 8.7) days versus the non-PAMS mean of 6.5 (2.8, 10.1) days.

Note that the reliability of these results is linked to the reliability with which we could estimate when in the time course of the infection the tests was done, which remains somewhat uncertain.

3.4.5 Days from non-infectiousness or detectable viral load to peak viral load

Here we calculate the number of days from barely non-detectable viral load or culture positivity of only 2.5% to peak viral load and thus peak culture positivity.

Expected Numbers of days from 2.5 to maximum culture positivity (left) and from 2.5 log10 viral load to maxium viral load. 90% credible interval in parentheses

Figure 3.23: Expected Numbers of days from 2.5 to maximum culture positivity (left) and from 2.5 log10 viral load to maxium viral load. 90% credible interval in parentheses

Note that the reliability of these results is linked to the reliability with which we could estimate when in the time course of the infection the tests was done, which is currently uncertain.

3.4.6 Culture positivity over time

Figure 3.24 shows estimated proportion of positive cultures over time.

Estimated culture positivity over time. Blue lines are posterior expectations (means) for individuals. The red line and shaded area are sample average and its 90% credible interval.

Figure 3.24: Estimated culture positivity over time. Blue lines are posterior expectations (means) for individuals. The red line and shaded area are sample average and its 90% credible interval.

We estimate a peak viral load of 8.2 (8.0, 8.3) at around 4.2 (4.0, 4.5) days after the estimated day of infection. Given the non-linear association between viral load and culture infectivity, the estimated culture infectivity shows greater changes over time, with a peak of 0.75 (0.61, 0.86) (between participation standard deviation: 0.17 (0.13, 0.21)), which declines to 0.53 (0.40, 0.65) 5 days later, and 0.29 (0.19, 0.40) 10 days after peak viral load.

3.4.7 Conditional effects of age on model parameters

We used a spline model to estimate the association between age and peak load as well as viral loads slopes.

Associations between age and individual level model parameters. The left column shows conditional effects, that is the associations with age after marginalizing out variability due to PAMS status and gender. Ribbons show 90% credible intervals. The histograms in the right column show posterior distributions for the average parameter estimate for the sample. The gray inlay at the bottom of panel C shows the age distribution of the sample.

Figure 3.25: Associations between age and individual level model parameters. The left column shows conditional effects, that is the associations with age after marginalizing out variability due to PAMS status and gender. Ribbons show 90% credible intervals. The histograms in the right column show posterior distributions for the average parameter estimate for the sample. The gray inlay at the bottom of panel C shows the age distribution of the sample.

3.4.8 Culture positivity over time by group

So far we have described viral load and culture positivity over time for the complete sample. Next, we investigate age differences. Here are the full time courses by age groups.

Viral load and culture positvity time courses by age group and differences in culture positivity.

Figure 3.26: Viral load and culture positvity time courses by age group and differences in culture positivity.

Viral load and culture positvity time courses by PAMS status at the first positive test

Figure 3.27: Viral load and culture positvity time courses by PAMS status at the first positive test

Viral load and culture positvity time courses by by B.1.1.7 and PAMS status at the first positive test

Figure 3.28: Viral load and culture positvity time courses by by B.1.1.7 and PAMS status at the first positive test

Viral load and culture positvity time courses by by B.1.1.7 status

Figure 3.29: Viral load and culture positvity time courses by by B.1.1.7 status

Viral load and culture positvity time courses by sex and differences in culture positivity

Figure 3.30: Viral load and culture positvity time courses by sex and differences in culture positivity

3.4.9 Viral load and culture positivity at -2, 0, 5, and 10 days from peak viral load

Estimated viral load at key time points by age group. The shaded lines cover the 90% credible interval.

Figure 3.31: Estimated viral load at key time points by age group. The shaded lines cover the 90% credible interval.

Figure 3.32 shows all pairwise comparisons for the day of peak viral load for the age groups.

Age differences in maximal viral load.

Figure 3.32: Age differences in maximal viral load.

Figure 3.33 shows pairwise comparisons for the day of peak infectiousness (estimated proportion of positive cultures) for the age groups.

Age differences in maximal viral load.

Figure 3.33: Age differences in maximal viral load.

Table 3.8: Differences in peak viral load between age groups
comparison Peak viral load Peak culture positivity
0-5<80><93>45-55 -0.73 (-1.06,-0.41) -0.215 (-0.347,-0.108)
10-15<80><93>45-55 -0.65 (-0.97,-0.27) -0.188 (-0.314,-0.068)
15-20<80><93>45-55 -0.5 (-0.7,-0.29) -0.142 (-0.218,-0.072)
20-45<80><93>45-55 -0.32 (-0.42,-0.22) -0.089 (-0.128,-0.055)
5-10<80><93>45-55 -0.56 (-0.93,-0.16) -0.16 (-0.298,-0.04)
55-65<80><93>45-55 0.22 (0.14,0.31) 0.055 (0.031,0.082)
65+<80><93>45-55 0.53 (0.41,0.65) 0.119 (0.081,0.165)

Here are the age-specific results for results for culture positivity.

Estimated culture positivity at key time points by age group. The shaded lines cover the 90% credible interval.

Figure 3.34: Estimated culture positivity at key time points by age group. The shaded lines cover the 90% credible interval.

The variation in culture infectiousness within age groups appear to be larger than the variation between age groups.

4 Article abstract and results

Note that the text (not the numbers!) in this section can show small deviations from the submitted version due to final text-edits that were made in this R-Markdown document.

4.1 Abstract

In clinical virology, two elementary parameters for quantifying viral infection and shedding respectively are viral load in the specimen, and whether a sample can yield a replicating virus isolate in cell culture. We examined viral load data from 23,424 RT-PCR confirmed SARS-CoV-2 infections detected in Germany from February 2020 to February 2021. The cases include 409 B.1.1.7 lineage infections, as well as 5,682 detected at testing locations typically attended by likely pre-symptomatic, asymptomatic, and mildly-symptomatic (PAMS) cases. We investigate absolute and relative estimated infectiousness according to age and symptomatic status at time of detection and over the course of infection. At the time of detection, B.1.1.7 subjects had mean first-positive viral load ~1.1 log10 higher than in non-B.1.1.7 subjects, while PAMS subjects generally had log10 viral loads 0.56 higher than hospitalized subjects of the same age. Comparing age groups, the youngest (0-5 years) had mean unadjusted viral load 0.7 lower than adults aged 20-65, with the viral load difference between young and old reducing steadily with increasing age of the young group, and little difference among those aged 20-65. Viral load and estimated infectiousness at the time of detection are highly variable, with a log10 viral load of 9 or higher in the first positive test of 1,978 subjects. Of these, 34.6% were likely PAMS, with mean age of 37.5 years, and high standard deviation of 13.5 years. B.1.1.7 subjects have estimated culture infectivity ~3x higher than non-B.1.1.7 subjects at the time of detection. From 4,019 patients with multiple RT-PCR tests, 80 % of whom were hospitalized at some point, we estimate a time of 4.2 days from infection to an estimated peak log10 viral load of 8.2, peak culture infectivity of 0.75, and viral load decline of 0.17 per day thereafter. Considered with incubation period and culture infectivity results from other studies, these values suggest peak viral load occurs 1-3 days before symptom onset. The least infectious children (0-5 years) have at least 0.69 (0.49,0.84)% of the peak probability of cell culture replication of adults aged over 20, and only minor differences are seen in infectiousness estimates according to gender or hospitalisation status. These clinical data show that those aged up to 20 years have viral loads and estimated infectiousness only slightly lower than that of older subjects, that PAMS subjects of all ages circulate in the community with viral loads and infectiousness that, at the time of detection, is slightly higher than similarly-aged symptomatic patients. B.1.1.7 subjects have estimated probability of cell culture replication 3.0 (1.9, 5.1) times higher than non-B.1.1.7 subjects at the time of detection, suggesting that higher viral load in B.1.1.7 subjects may result in increased infectiousness.

4.2 Results: Sample

We examined RT-PCR results from 393,129 subjects aged 0-100 years screened for SARS-CoV-2 infection from February 24, 2020 to March 01, 2021. Samples were collected at test centres, mostly in and around Berlin, Germany, and analysed with Roche LC480 or cobas PCR systems. Of all samples, 23,424 (2.95%) had at least one positive RT-PCR test (Table 1). Based on the submitting test centre type, we classified 5,636 (24.06%) as likely PAMS at the time of their first positive test (Figure 1A and Table S1), including 5,682 of 147,202 (3.86%) tested at public Berlin COVID-19 test centres and 487 of 41,907 (1.16%) tests of Berlin airport returnees in August 2020. PAMS subjects were typically younger than non-PAMS subjects, with the great majority being 25-53 years versus 40-84 years, respectively (Figure 1, Table 1). Regarding hospitalization, 6,389 (27%) of our subjects were already in hospital at the time of their first positive test and 8,756 (37%) were hospitalized at some point during their infection. Typing RT-PCR indicated that 409 cases likely involved the B.1.1.7 lineage strain, many of which originated from two Berlin hospital outbreaks. The accuracy of the B.1.1.7 typing PCR indications was confirmed by next-generation sequencing of 49 complete viral genomes.

4.3 Results: First positive viral load

Across all subjects, the mean viral load (herein given as log10 RNA copies per swab) in the first positive-testing sample was 6.4, with standard deviation (sd) 1.82. PAMS subjects had viral loads higher than the hospitalised for ages up to 70 years (Table 1), as exemplified by a 6.9 mean for PAMS compared to a 6.1 mean in non-PAMS adult subjects of 20-65 years (Table 1). Crude comparisons of viral loads in age groups show no substantial difference in first-positive viral load between groups of people aged over 20 years (Table 1) and that children and adolescents have between -0.7 (-0.92, -0.49) and -0.2 (-0.35, -0.05) lower viral loads than adults aged 20-65 (Table 2).

We used a Bayesian thin-plate spline regression to estimate the relationship between age, PAMS status, and viral load from the first positive RT-PCR of each subject, adjusting for gender, type of test centre, and PCR system used. The Bayesian model well represents the observed data (Figures 1B and S1, Table 2). The raw data and the Bayesian estimation (Figure 2A), suggest considering subjects in three age categories: young (ages 0-20 years, grouped into five-year brackets), adult (20-65 years), and elderly (over 65 years). We estimated an average viral load of 6.28 with a 90% credible interval of (6.26, 6.31) for adults (Figure 2A) and a very similar mean of 6.40 (6.33, 6.47) for the elderly. Younger age groups had lower viral loads than adults, with the difference falling steadily from -0.64 (-0.79, -0.5) for the very youngest (0-5 years) to -0.15 (-0.21, -0.09) for older adolescents (15-20 years) (Table 2). Viral loads of subjects younger than 65 years old were around 0.8 higher for PAMS than for hospitalized (Figure 2C). Young age groups of PAMS subjects have lower viral loads than older PAMS subjects, with differences ranging from -0.2 (-0.32, -0.09) to -0.57 (-0.86, -0.27). Among hospitalized subjects these differences are somewhat smaller, ranging from -0.2 (-0.48, 0.09) to -0.13 (-0.25, -0.01) (Table 2, Figure 2B). Viral loads of subjects younger than 65 years were around 0.75 higher for PAMS than for hospitalised (Figure 2C).

We estimated the association between age and culture probability by combining the Bayesian regression estimations with cell culture isolation data from our own laboratory14 and from Perera et al.15 (Figures 2D, 2E, 2F). Across all ages, the average estimated culture probability at the time of first positive RT-PCR 0.24 (0.15, 0.35). The mean probability is higher for PAMS cases, at 0.38 (0.26, 0.51), than hospitalized cases, at 0.21 (0.11, 0.32). PAMS and hospitalized adults have estimated culture probabilities of and , respectively (Figure 2F). In PAMS cases, we found differences, in particular for children aged 0-5 compared to adults aged 20-65, with average culture probabilities of and respectively, and a difference of -0.19 (-0.3, -0.09). Age group differences in hospitalized cases range from -0.03 (-0.08, 0.02) to -0.02 (-0.05, 0) (Table 2).

First-positive viral loads are weakly bimodally distributed (Figures 1A, 2A), which is not reflected in age-specific means. The resultant distribution of culture probability includes a majority of subjects with relatively low estimated shedding and a small minority with a very high probability of positive culture (Figure 2D). The highly-infectious subset of subjects includes 1,978 of 23,424 positive subjects (8.44%) with a first-positive viral load of at least 9.0 log10, corresponding to an estimated culture probability of ~0.92 to 1.0. Of these 1,978 subjects, 684 (34.58%) were likely PAMS at the time of testing with a mean (median) age of 37.53 (33.81), and sd of 13.5 years.

4.4 Results: B.1.1.7

The 409 B.1.1.7 subjects in our dataset had an observed mean first-positive viral load of 7.33 (sd 1.56), which is 0.98 log10 higher (CI: 0.82, 1.13) than non-B.1.1.7 subjects in the full dataset. To increase specificity, we compared 362 B.1.1.7 cases with 0 non-B.1.1.7 cases using viral loads only from centres with B.1.1.7 and non-B.1.1.7 cases, and only from the same day or one day before or after the B.1.1.7 sample was taken. This analysis also adjusted for PAMS status, gender, RT-PCR system, subject age, and modeled random test centre effects. The results show that B.1.1.7 cases are associated with a 1.1 (0.9, 1.3) higher viral load (Table S2). This results in an average estimated culture probability of 0.53 (0.37, 0.69), which is considerably higher than the overall figure of 0.19 (0.09, 0.32) for the non-B.1.1.7 subjects in the comparison and corresponds to a 3.0 (1.9, 5.1) times higher culture probability for B.1.1.7 cases.

4.5 Results: Viral load time course

We used the time series of 4,019 subjects with RT-PCR results from at least three days to investigate viral load over the course of the infection. We estimated attack and decline slopes of a model of linear increase and then decline of log10 viral load using a Bayesian hierarchical model. We estimated time from infection to peak viral load of 4.21 (3.95, 4.47) days, mean peak viral load of 8.2 (8.0, 8.3), and mean decreasing viral load slope of -0.167 (-0.171, -0.164) log10 per day. Individual level trajectories can differ considerably from the trajectory described by these mean parameters (Figure 3B). We estimate NA (NA, NA) (NaN (NA, NA) percent) of the subjects had a positive test before the time of their peak viral load, with mean day -1.3 (-1.4, -1.3). Of those with a first-positive test estimated to have occurred after the day of peak viral load, the mean estimated day of detection was 9.8 (9.6, 10.0) with sd 7.0 (6.9, 7.1), reflecting a broad time range of infection detection. Viral load time courses are similar across age groups, though younger subjects have somewhat lower peak viral load than adults older than 45 (Figure 4A, Figure S13, and Table S4). Applied to subjects with higher minimum number of RT-PCR results (from four to nine), model results are self-consistent and agree well with a separate implementation based on simulated annealing (Table S2, Figures S6).

We estimate that the rise from near zero to peak culture probability takes 1.8 (1.2, 2.5) days, with a mean peak culture probability of 0.75 (0.61, 0.86) (Figure 3C). Mean culture probability then declines to 0.53 (0.40, 0.65) at five days and to 0.29 (0.19, 0.40) at ten days after peak viral load. Figure 3C shows that individual-level time-courses can deviate substantially from these mean estimates. Peak culture probabilities for age groups range from a low of 0.47 (0.31, 0.63) for 0-5 year olds to 0.81 (0.67, 0.91) for subjects older than 65 years (Figure 4B, 4D). Hence, the least infectious youngest children have 0.69 (0.49,0.84) percent of the peak culture probability of the most infectious older age group (Figures 4B, 4D and Table S4).

5 Figures

5.1 Figure 1: First positive test - with raw data

Distributions of viral loads as subject ages: (A) First positive viral loads in 5 year age groups, split by PAMS status. B) Observed (red) and predicted (blue) age-wise mean viral loads with confidence and credible intervals, respectively. C) Distribution of subject ages for PAMS and non-PAMS subjects.

Figure 5.1: Distributions of viral loads as subject ages: (A) First positive viral loads in 5 year age groups, split by PAMS status. B) Observed (red) and predicted (blue) age-wise mean viral loads with confidence and credible intervals, respectively. C) Distribution of subject ages for PAMS and non-PAMS subjects.

5.2 Figure 2: First positive test - age differences

Viral load and cultrure infectivty according to age. Top row: Viral load according to age. A) The number of samples in 10-year age categories over the months of data collection. B) Viral load according to age and symptom status. PAMS are cases tested at Berlin airports, Covid-19 test centres, or routine employee testing at Charit<U+00E9> hospital Berlin. The dotted line outlines the 90% credible interval for the complete sample. The small histogram (right) shows the observed viral load distribution. Shaded areas are 90% credible intervals. C) Differences in viral load in four age group comparisons (age ranges given in years) and also according to symptom status. Bottom row: Viral load according to days from peak viral load. D) Association of viral load and culture positivity. E) Culture infectivity according to age and symptom status. The small histogram to the right displays expected culture infectivity, which was calculated by applying the results shown in A to observed viral loads (see Figure 1B). C) Differences in culture positivity in four age group comparisons and also according to symptom status. F) Period of highest infectivity according to length of testing period and age.

Figure 5.2: Viral load and cultrure infectivty according to age. Top row: Viral load according to age. A) The number of samples in 10-year age categories over the months of data collection. B) Viral load according to age and symptom status. PAMS are cases tested at Berlin airports, Covid-19 test centres, or routine employee testing at Charit<U+00E9> hospital Berlin. The dotted line outlines the 90% credible interval for the complete sample. The small histogram (right) shows the observed viral load distribution. Shaded areas are 90% credible intervals. C) Differences in viral load in four age group comparisons (age ranges given in years) and also according to symptom status. Bottom row: Viral load according to days from peak viral load. D) Association of viral load and culture positivity. E) Culture infectivity according to age and symptom status. The small histogram to the right displays expected culture infectivity, which was calculated by applying the results shown in A to observed viral loads (see Figure 1B). C) Differences in culture positivity in four age group comparisons and also according to symptom status. F) Period of highest infectivity according to length of testing period and age.

5.3 Figure 3: B.1.1.7

Posterior distributions of estimated viral loads and culture probabilities and differences between B.1.1.7 and non-B.1.1.7 subjects. A posterior distribution of log10 viral load. Darker shadings indicate 10 to 90% credible intervals. B difference of average viral load between B.1.1.7 and non-B.1.1.7 cases. C posterior distribution of the probability of a positive culture load. Darker shadings indicate 10 to 90% credible intervals. D Difference of average culture probability between B.1.1.7 and non-B.1.1.7 cases.

Figure 5.3: Posterior distributions of estimated viral loads and culture probabilities and differences between B.1.1.7 and non-B.1.1.7 subjects. A posterior distribution of log10 viral load. Darker shadings indicate 10 to 90% credible intervals. B difference of average viral load between B.1.1.7 and non-B.1.1.7 cases. C posterior distribution of the probability of a positive culture load. Darker shadings indicate 10 to 90% credible intervals. D Difference of average culture probability between B.1.1.7 and non-B.1.1.7 cases.

5.4 Figure 4: Time course

Viral load and culture infecivity over time in an infection. A) The number of samples for which at least three test results were available, according to age. B) Estimated time course of viral load. Blue lines are expected time courses for individual cases. The sample average is shown in red. The small histogram to the right shows the distribution of viral loads in the sample. Cases with at least one test in a hospital ward are categorized as hospitalised. C) Viral load time course by age group. E) Estimated time course of culture infectivity, calculated by applying the results shown in A to the observed viral load time courses shown in Figure 1E. Blue lines are expected time courses for individual time cases. The sample average is shown in red. The small histogram to the right shows the distribution of culture infectivity in the sample, calculated by applying results shown in A to observed viral loads (see Figure 1E). Cases with at least one test in a hospital ward are categorized as hospitalised. F) Culture infecitivity time course by age group.

Figure 5.4: Viral load and culture infecivity over time in an infection. A) The number of samples for which at least three test results were available, according to age. B) Estimated time course of viral load. Blue lines are expected time courses for individual cases. The sample average is shown in red. The small histogram to the right shows the distribution of viral loads in the sample. Cases with at least one test in a hospital ward are categorized as hospitalised. C) Viral load time course by age group. E) Estimated time course of culture infectivity, calculated by applying the results shown in A to the observed viral load time courses shown in Figure 1E. Blue lines are expected time courses for individual time cases. The sample average is shown in red. The small histogram to the right shows the distribution of culture infectivity in the sample, calculated by applying results shown in A to observed viral loads (see Figure 1E). Cases with at least one test in a hospital ward are categorized as hospitalised. F) Culture infecitivity time course by age group.

5.5 Figure 5: Culture positivity by group

Age differences in viral load and culture infectivity over time

Figure 5.5: Age differences in viral load and culture infectivity over time

6 Stan model for time course estimation

data {
  // Data for time course analysis
  int N_DAY;                       // number of data points
  vector[N_DAY] Y_DAY;             // outocome log10 viral load
  vector[N_DAY] X_DAY;             // day of measurement in time series
  int G;                           // number of subjects
  int gstart_DAY[G];               // first day in time series
  int gend_DAY[G];                 // last day in time series
  int N_NegTests;                  // Total number negative test results
  int idx_NegTests[N_NegTests];    // position of negative test results in Y_DAY
  vector[N_DAY] PCR;               // Type of PCR system used for test
  int N_centres;                   // Number of test centre types
  matrix[N_DAY,N_centres] centre;  // Matrix with centre types for each test
  int N_centre1;                   // Number of test centre types for 1st test
  int centre1[G];                  // centre of each participants 1st test
  int N_ld_centre;                 // Number of test centre types with longest "stay"
  int ld_centre[G];                // each subjects centre types with longest "stay"
  real imputation_limit;           // upper limit for imputed viral loads for neg tests
  int num_basis_Age;               // numberf of bases for Age spline model
  matrix[num_basis_Age, G] B_Age;  // regression matrix for Age spline model
  vector[G] Age;                   // Cases ages
  int K_PGH;                       // Number of covariates
  matrix[G,K_PGH] X_PGH;           // covariates 
  vector[G] max_load;              // Maximum measured load per case
  int condition_on_data;           // 1 for parameter estimation, 0 for prior predictive plots
  // Data for culture positivity analysis
  int N_CP;                        // number of data points
  int Y_CP[N_CP];                  // Outcome (0,1)
  matrix[N_CP,1] X_CP;             // Predictor (log10 viral load)
}

parameters {
  // grand means for key model parameters
  real log_slope_up_mu;
  real log_slope_down_mu;
  real log_intercept_mu;
  
  // covariates for key model parameters
  row_vector[num_basis_Age] a_raw_intercept_Age;
  real<lower=0> tau_intercept_Age;
  real<lower=0> a0_intercept_Age; 
  vector[K_PGH] betaPGH_intercept;
  row_vector[num_basis_Age] a_raw_slope_up_Age;
  real<lower=0> tau_slope_up_Age;
  real a0_slope_up_Age; 
  vector[K_PGH] betaPGH_slope_up;
  row_vector[num_basis_Age] a_raw_slope_down_Age;
  real<lower=0> tau_slope_down_Age;
  real a0_slope_down_Age; 
  vector[K_PGH] betaPGH_slope_down;

  // individual random effects for key model parameters
  real<lower=0> intercept_sigma;
  vector[G] intercept_raw; 
  real<lower=0> slope_up_sigma;
  vector[G] slope_up_raw;
  real<lower=0> slope_down_sigma;
  vector[G] slope_down_raw;
  real<lower=0> sigma_sigma;
  vector[G] sigma_raw;
  
  // first centre random effects for key model parameters
  real<lower=0> int_centre1_sigma;
  vector[N_centre1] int_centre1_raw;
  real<lower=0> slope_down_ld_centre_sigma;
  vector[N_ld_centre] slope_down_ld_centre_raw;
  
  // smoothness parameter for logistic weight function
  real<lower=0> beta_sweight_mu;
  
  // impute negative tests
  vector<lower=-25,upper=imputation_limit>[N_NegTests] imp_neg;
  
  // shifts for individuals
  vector<lower=-10,upper=20>[G] b_shift;
  // random effects of cirst centre on shifts
  real<lower=0> shift_centre1_sigma;
  vector[N_centre1] shift_centre1_raw;
  
  // error variance
  real sigma_mu; // mean
  
  // centre random effects on viral load
  vector[N_centres] centre_raw;
  real<lower=0> centre_sigma;
  // PCR system effect on viral load
  real intercept_PCR;
  
  // culture positivity parameters
  real alpha_CP;
  vector[1] beta_CP;
}

transformed parameters {
  // spline coefficients
  row_vector[num_basis_Age] a_slope_up_Age = a_raw_slope_up_Age*tau_slope_up_Age;
  row_vector[num_basis_Age] a_intercept_Age = a_raw_intercept_Age*tau_intercept_Age;
  row_vector[num_basis_Age] a_slope_down_Age = a_raw_slope_down_Age*tau_slope_down_Age;
  
  // centre random effects on measured loads
  vector[N_centres] int_centr = centre_raw * centre_sigma;
  // error variance
  vector[G] sigma = exp(sigma_mu + sigma_sigma*sigma_raw);
  // initialize vectors for individual level parameters
  vector<lower=0>[G] intercept;
  vector<lower=0>[G] slope_up;
  vector<upper=0>[G] slope_down;
  vector<lower=0>[G] time2peak;
  vector[G] shift;
  { // calculate model parameters
     vector[N_centre1] shift_centre1 = shift_centre1_sigma * shift_centre1_raw;
     vector[N_centre1] int_centre1 = int_centre1_sigma * int_centre1_raw;
     vector[N_ld_centre] slope_down_ld_centre = slope_down_ld_centre_sigma * slope_down_ld_centre_raw;
     vector[G] log_intercept = 
                 log_intercept_mu + // intercept (fixed effect [FE], group level mean)
                 intercept_sigma * intercept_raw + // individual level random effects
                 a0_intercept_Age * Age + to_vector(a_intercept_Age * B_Age) + // age with splines (FE)
                 X_PGH * betaPGH_intercept; // covariates (FE)
     vector[G] log_slope_up = 
                 log_slope_up_mu +
                 slope_up_sigma * slope_up_raw + 
                 a0_slope_up_Age * Age + to_vector(a_slope_up_Age * B_Age) + 
                 X_PGH * betaPGH_slope_up; 
     vector[G] log_slope_down = 
                 log_slope_down_mu + 
                 slope_down_sigma * slope_down_raw + 
                 a0_slope_down_Age * Age + to_vector(a_slope_down_Age * B_Age) + 
                 X_PGH * betaPGH_slope_down;
     // test-centre category based random effects
     for (g in 1:G) {
       shift[g] = b_shift[g] + shift_centre1[centre1[g]];
       log_intercept[g] += int_centre1[centre1[g]];
       log_slope_down[g] += slope_down_ld_centre[ld_centre[g]];
     }
     // apply link function
     intercept = exp(log_intercept);
     slope_up = exp(log_slope_up);
     slope_down = -exp(log_slope_down);
  }
  // print("int_centre1_sigma: ",int_centre1_sigma);
  // print("slope_down_ld_centre_sigma: ",slope_down_ld_centre_sigma);
  // print("log_intercept_mu: ",log_intercept_mu);
  // print("log_slope_up_mu: ",log_slope_up_mu);
  // print("a0_intercept_Age: ",log_intercept_mu);
  // print("a0_slope_up_Age: ",log_slope_up_mu);
  // print("betaPGH_intercept: ",betaPGH_intercept);
  // print("betaPGH_slope_up: ",betaPGH_slope_up);
  // print("SLOPE-UP: ",slope_up);
  // print("INTERCEPT: ",intercept);
  // print("time2peak: ",intercept ./ slope_up);
  time2peak = intercept ./ slope_up;
}

model {
  // DAY
  log_intercept_mu ~ normal(2.1,.1); 
  log_slope_up_mu ~ normal(.6,.25); 
  log_slope_down_mu ~ normal(-1.75,.5); 
  beta_sweight_mu ~ normal(10,1); 
  
  shift_centre1_sigma ~ std_normal();
  shift_centre1_raw ~ std_normal();
  
  intercept_sigma ~ gamma(4,20);
  intercept_raw ~ std_normal();
  
  slope_up_sigma ~ gamma(4,20);
  slope_up_raw ~ std_normal();
  
  slope_down_sigma ~ gamma(4,20);
  slope_down_raw ~ std_normal();
  
  sigma_mu ~ std_normal();
  sigma_sigma ~ std_normal();
  sigma_raw ~ std_normal();
  
  a0_intercept_Age ~ normal(0,.125);
  a_raw_intercept_Age ~ std_normal();
  tau_intercept_Age ~ normal(0, .5); 
  betaPGH_intercept ~ normal(0,.125);
  // one sd change in age changes slope by up to around 20%
  // pnorm(.2,0,.125) = .945, exp(.2) = 1.2
  a0_slope_up_Age ~ normal(0,.125);
  a_raw_slope_up_Age ~ std_normal(); 
  tau_slope_up_Age ~ normal(0, .75);  
  betaPGH_slope_up ~ normal(0,.125);
  a0_slope_down_Age ~ normal(0,.225);
  a_raw_slope_down_Age ~ std_normal(); 
  tau_slope_down_Age ~ normal(0, .75); 
  betaPGH_slope_down ~ normal(0,.225);
  
  // random centre effects
  centre_raw ~ std_normal();
  centre_sigma ~ std_normal();
  
  int_centre1_sigma ~ normal(0,.125);
  int_centre1_raw ~ std_normal();
  slope_down_ld_centre_sigma ~ normal(0,.125);
  slope_down_ld_centre_raw ~ std_normal();
  
  intercept_PCR ~ normal(0,1);
  
  // impute negative tests
  // imp_neg ~ student_t(2,0,1);

  if (condition_on_data == 1) {
    vector[N_DAY] Y_DAY_imputed = Y_DAY;
    for (j in 1:N_NegTests) {
      Y_DAY_imputed[idx_NegTests[j]] = imp_neg[j];
    }
    
    for (g in 1:G) {
      int N_g = gend_DAY[g] - gstart_DAY[g] + 1;
      vector[N_g] DAY_shifted1 = X_DAY[gstart_DAY[g]:gend_DAY[g]] + shift[g];
      vector[N_g] weight_down1 = inv_logit(DAY_shifted1 * beta_sweight_mu);
      vector[N_g] est_up1 = intercept[g] + slope_up[g]*DAY_shifted1;
      vector[N_g] est_down1 = intercept[g] + slope_down[g]*DAY_shifted1;
      vector[N_g] yhat1 = (1-weight_down1) .* est_up1 + weight_down1 .* est_down1;

      yhat1 += PCR[gstart_DAY[g]:gend_DAY[g]] * intercept_PCR;
      yhat1 += centre[gstart_DAY[g]:gend_DAY[g],] * int_centr;
      //print("yhat: ",yhat1);
      target += normal_lpdf(Y_DAY_imputed[gstart_DAY[g]:gend_DAY[g]] | yhat1, sigma[g]);
    }
    // culture positivity analysis
    alpha_CP ~ normal(0,20);
    beta_CP ~ normal(0,3);
    Y_CP ~ bernoulli_logit_glm(X_CP,alpha_CP, beta_CP);
  }
}

generated quantities {
  real slope_up_mu = exp(log_slope_up_mu);
  real slope_down_mu = exp(log_slope_down_mu);
  real intercept_mu = exp(log_intercept_mu);
  real time2peak_mu = intercept_mu/slope_up_mu;
}

References

Betancourt, Michael. 2016. Diagnosing Suboptimal Cotangent Disintegrations in Hamiltonian Monte Carlo,” April. http://arxiv.org/abs/1604.00695.
Bürkner, Paul-Christian. 2017. brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.
———. 2018. “Advanced Bayesian Multilevel Modeling with the R Package brms.” The R Journal 10 (1): 395–411. https://doi.org/10.32614/RJ-2018-017.
Carpenter, Bob, Andrew Gelman, Matthew Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1): 1–32.
Gabry, Jonah, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman. 2019. “Visualization in Bayesian Workflow.” J. R. Stat. Soc. A 182: 389–402. https://doi.org/10.1111/rssa.12378.
Goodrich, Ben, Jonah Gabry, Imad Ali, and Sam Brilleman. 2020. “Rstanarm: Bayesian Applied Regression Modeling via Stan.” https://mc-stan.org/rstanarm.
Kampen, Jeroen J A van, David A M C van de Vijver, Pieter L A Fraaij, Bart L Haagmans, Mart M Lamers, Nisreen Okba, Johannes P C van den Akker, et al. 2020. Shedding of infectious virus in hospitalized patients with coronavirus disease-2019 (COVID-19): duration and key determinants.” medRxiv, June, 2020.06.08.20125310.
Perera, Ranawaka Apm, Eugene Tso, Owen T Y Tsang, Dominic N C Tsang, Kitty Fung, Yonna W Y Leung, Alex W H Chin, et al. 2020. SARS-CoV-2 virus culture from the upper respiratory tract: Correlation with viral load, subgenomic viral RNA and duration of illness.” medRxiv, July, 2020.07.08.20148783.
Vehtari, Aki, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian Bürkner. 2020. Rank-Normalization, Folding, and Localization: An Improved \(\widehat\{R\}\) for Assessing Convergence of MCMC.” Bayesian Analysis.
Wood, S. N. 2003. “Thin-Plate Regression Splines.” Journal of the Royal Statistical Society (B) 65 (1): 95–114.
Wölfel, Roman, Victor M Corman, Wolfgang Guggemos, Michael Seilmaier, Sabine Zange, Marcel A Müller, Daniela Niemeyer, et al. 2020. Virological assessment of hospitalized patients with COVID-2019.” Nature 581 (7809): 465–69.

  1. i.e. 95% of the mass of the posterior are below 0.21, where a one unit increase corresponds to an increase of the estimated parameters by the factor 1.23↩︎